Reconciling Inconsistent Data in Probabilistic XML Data Integration
نویسنده
چکیده
The problem of dealing with inconsistent data while integrating XML data from different sources is an important task, necessary to improve data integration quality. Typically, in order to remove inconsistencies, i.e. conflicts between data, data cleaning (or repairing) procedures are applied. In this paper, we present a probabilistic XML data integration setting. A probability is assigned to each data source and its probability models the reliability level of the data source. In this way, an answer (a tuple of values of XML trees) has a probability assigned to it. The problem is how to compute such probability, especially when the same answer is produced by many sources. We consider three semantics for computing such probabilistic answers: by-peer, by-sequence, and bysubtree semantics. The probabilistic answers can be used for resolving a class of inconsistencies violating XML functional dependencies defined over the target schema. Having a probability distribution over a set of conflicting answers, we can choose the one for which the probability of being correct is the highest.
منابع مشابه
Probabilistic XML: Models and Complexity
Uncertainty in data naturally arises in various applications, such as data integration and Web information extraction. Probabilistic XML is one of the concepts that have been proposed to model and manage various kinds of uncertain data. In essence, a probabilistic XML document is a compact representation of a probability distribution over ordinary XML documents. Various models of probabilistic ...
متن کاملTuple Merging in Probabilistic Databases
Real-world data are often uncertain and incomplete. In probabilistic relational data models uncertainty can be modeled on two levels. First by representing the uncertain instance of a tuple by a set of possible instances and second by assigning each tuple with its degree of membership to the considered relation. To overcome incompleteness, data from multiple sources need to be combined. In orde...
متن کاملProbabilistic XML functional dependencies based on possible world model
With the increase of uncertain data in many new applications, such as sensor network, data integration, web extraction, etc., uncertainty both in relational databases and XML datasets has attracted more and more research interests in recent years. As functional dependencies (FDs) are critical and necessary to schema design and data rectification in relational databases and XML datasets, it is a...
متن کاملProbabilistic XML in Information Integration
Information integration is a difficult research problem. In an ambient environment, where devices can connect and disconnect arbitrarily, the problem only increases, because data sources may become available at any time, but can also disappear. In such an environment, information integration needs to be unattended, because information integration opportunities arise on ad-hoc basis. We propose ...
متن کاملUser Feedback in Probabilistic XML
Data integration is a challenging problem in many application areas. Approaches mostly attempt to resolve semantic uncertainty and conflicts between information sources as part of the data integration process. In some application areas, this is impractical or even prohibitive, for example, in an ambient environment where devices on an ad hoc basis have to exchange information autonomously. We h...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008